Exemplar-Based Topic Detection in Twitter Streams

نویسندگان

  • Ahmed Elbagoury
  • Rania Ibrahim
  • Ahmed K. Farahat
  • Mohamed S. Kamel
  • Fakhri Karray
چکیده

Detecting topics in Twitter streams has been gaining an increasing amount of attention. It can be of great support for communities struck by natural disasters, and could assist companies and political parties understand users’ opinions and needs. Traditional approaches for topic detection focus on representing topics using terms, are negatively affected by length limitation and the lack of context associated with tweets. In this work, we propose an Exemplar-based approach for topic detection, in which detected topics are represented using a few selected tweets. Using exemplar tweets instead of a set of key words allows for an easy interpretation of the meaning of the detected topics. Experimental evaluation on benchmark Twitter datasets shows that the proposed topic detection approach achieves the best term precision. It does this while maintaining good topic recall and running time compared to other approaches. In 2014, active users on Twitter reached more than 645 million and produced approximately 500 million tweets daily.1 From these numbers, users can easily miss important topics. Thus the need for mining this amount of unstructured data became paramount. Topic detection in Twitter is an important mining task that has drawn a lot of attention during the past few years. It is defined as the task of discovering the underlying key topics that occur in a set of tweets. Some of the benefits of this task include: discovering natural disasters as early as possible, helping political parties and companies understand users’ opinions and improving content marketing by better understanding customer needs. Many approaches can be used to detect important topics that occur in a set of documents. However, many challenges arise when traditional approaches are applied on Twitter data. One of the challenges is the scalability of the processing methods to deal with the massive amounts of daily generated tweets. Previous approaches that were developed in literature for topic detection like (Deerwester 1990) and (Aiello et al. 2013) focus on identifying terms that represent the topic regardless of how the terms can be properly connected so that they can be easily interpreted by an individual and regardless of whether or not noisy terms are included in the reCopyright c © 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. http://www.statisticbrain.com/twitter-statistics/ trieved set. This motivated us to propose a fast and accurate Exemplar-based approach to detect topics in Twitter based on representing each topic by a single tweet. This Exemplarbased representation alleviates the aforementioned problems and allows for easy understanding of the retrieved topics. The rest of the paper is organized as follows: we start first by discussing some of the related work, then presenting the proposed approach for topic detection. After that, we show the implementation details, experimental results and discussion. Finally, the last section concludes the paper.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Time Series Topic Modeling and Bursty Topic Detection of Correlated News and Twitter

News and twitter are sometimes closely correlated, while sometimes each of them has quite independent flow of information, due to the difference of the concerns of their information sources. In order to effectively capture the nature of those two text streams, it is very important to model both their correlation and their difference. This paper first models their correlation by applying a time ...

متن کامل

User context as a source of topic retrieval in Twitter

Context in a web-based social system can be a valuable source of user information. On Twitter, context can be derived from user interactions, content streams and friendship. In this paper we focus on extracting user context by means of conversation patterns and user-generated twitter lists. We present a novel approach which utilizes just the context to extract twitter users’ topics of interest....

متن کامل

Finding Bursty Topics from Microblogs

Microblogs such as Twitter reflect the general public’s reactions to major events. Bursty topics from microblogs reveal what events have attracted the most online attention. Although bursty event detection from text streams has been studied before, previous work may not be suitable for microblogs because compared with other text streams such as news articles and scientific publications, microbl...

متن کامل

Real-Time Sentiment-Based Anomaly Detection in Twitter Data Streams

We propose an approach for real-time sentiment-based anomaly detection (RSAD) in Twitter data streams. Sentiment classification is used to split the data into independent streams (positive, neutral, and negative), which are then analyzed for anomalous spikes in the number of tweets. Four approaches for evaluating the data streams are studied, along with the parameters that adjust their sensitiv...

متن کامل

Recurrent Chinese Restaurant Process with a Duration-based Discount for Event Identification from Twitter

Due to the fast development of social media on the Web, Twitter has become one of the major platforms for people to express themselves. Because of the wide adoption of Twitter, events like breaking news and release of popular videos can easily catch people’s attention and spread rapidly on Twitter, and the number of relevant tweets approximately reflects the impact of an event. Event identifica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015